Discovery of Anomalous Windows through a Robust Nonparametric Multivariate Scan Statistic (RMSS)
نویسندگان
چکیده
This paper studies unusual phenomena by discovering anomalous windows in multivariate spatial data. Such an anomalous window is a group of contiguous spatial objects indicating the occurrence of unusual phenomenon in terms of multiple variables. The paper presents a novel Robust non-parametric Multivariate Scan Statistic (RMSS). In contrast to the existing work, the authors’ approach is designed to deal with anomalous window discovery in multivariate data. They propose their multivariate scan statistic that employs the robust Mahalanobis distance which enables taking into account multiple behavioral attributes at the same time and their correlations for the discovery of significant anomalous windows. The proposed multivariate scan statistic is non-parametric such that it does not rely on any prior assumption about the data distribution. It is robust such that it can handle data with large amount of outliers, up to 50% of the overall data size. It is also affine equivariant such that affine transformation such as stretch or rotation of the data would not affect the results. The authors evaluate their approach with (a) real-world multivariate climate data for discovering natural disasters and climate changes, (b) real-world multivariate traffic accident data for identifying accident hubs, which are route segments with underlying accident-prone issues, and (c) synthetic data of both continuous and discrete multivariate distribution for identifying clusters of known outliers under different outlier percentage in data. They compare their results to state of the art multivariate scan statistic method (Kulldorff et al., 2007). The evaluation shows the detection power of the authors’ method, and the significant improvement over the existing methods. DOI: 10.4018/jdwm.2013010102 International Journal of Data Warehousing and Mining, 9(1), 28-55, January-March 2013 29 Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. INTRODUCTION Study of unusual phenomena (Shi & Janeja, 2009; Kulldorff, 1997; Schoier & Borruso, 2012) finds use in various applications related to spatial data (Silva, Moura-Pires, & Santos, 2012) such as discovery of (i) disease outbreak in a region (Kulldorff, 1997), (ii) accident hubs along highways (Shi & Janeja, 2009), and (iii) leak of toxins in water or air, to name a few. Timely identification of such unusual phenomena is of utmost important, so that corresponding solutions can be prepared and applied in time to avoid and reduce the relevant personnel and property loss. Such unusual phenomena can be identified as anomalous windows in spatial data. An anomalous window is a group of contiguous spatial objects where the phenomenon takes place. These spatial objects are discovered by being quantified as unusual, in terms of their behavior, with respect to those of the other spatial objects in the data. Traditional approach of outlier detection such as nearest neighbor based methods (Knorr & Ng, 1997) or clustering based methods (Ester, Kriegel, Sander, J., & Xu, 1996) are not very suitable for such a discovery. It is primarily due to the reason that they are designed to identify individual outliers, and do not consider the quantification of the unusualness of such anomalous windows and the necessary spatial relationships between spatial objects in forming anomalous windows, which is the distinct characteristic of the spatial data. In contrast, spatial scan statistic has proven to be a promising technique in quantifying anomalous windows. The traditional univariate spatial scan statistic (Kulldorff, 1997) discovers anomalous windows in practical applications by studying unusual behaviors of spatial objects in terms of one single attribute of interest. For instance in studying disease outbreaks the attribute of interest is the number of people afflicted by a disease. In studying accident hubs along highways the attribute of interest could be the number of fatalities or alternatively the number of crashes at a particular mile marker. However, such univariate spatial scan statistic methods cannot study multiple aspects of a phenomenon that have multi-dimensional or multi-domain influences. Studying multiple aspects of a phenomenon and their interaction is very critical as the phenomenon may not necessarily be measured in the one dimension or even the one domain, but could have multi-dimensional and multi-domain influences. For instance the phenomenon of climate change can have influences in temperature, humidity, rainfall, wind, snow etc. at a location. The underlying spatial processes governing such a phenomenon, in this example climate change, as we stated, could influence multiple spatial properties of a region where it takes place. So suppose if an unusual temperature being observed is a result of the climate change, then it will not only affect this aspect of the climate but also influence the other weather patterns in the region, or even other environmental aspects such as crop growth, animal ecosystems and so on. While solely studying the phenomenon of climate change in a single aspect may not be sufficient or even feasible as it may well be hidden or even if visible may appear insignificant on its own, but the comprehensive influences on entire weather patterns, and environmental patterns are much more clear and convincing. Thus to fathom the underlying processes on such a phenomenon that is difficult to study in a single dimension or single domain, one has to look at the multiple relevant influences it has on spatial properties of other attributes within one domain or even other domains, in a unified manner with the analysis centered on space. This can be achieved by extending univariate spatial scan statistic to the multivariate spatial scan statistic for processing multivariate spatial data. Such multivariate spatial data can have attributes describing variety of spatial properties in the region. Recent work on multivariate scan statistic (Kulldorff et al., 2007; Neill & Cooper, 2010) directly extends the univariate scan statistics 26 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/discovery-anomalous-windowsthrough-robust/75614?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Library Science, Information Studies, and Education. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2
منابع مشابه
Fast generalized subset scan for anomalous pattern detection
We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimizat...
متن کاملAn Efficient Approach to Event Detection and Forecasting in Dynamic Multivariate Social Media Networks
Anomalous subgraph detection has been successfully applied to event detection in social media. However, the subgraph detection problembecomes challenging when the social media network incorporates abundant attributes, which leads to a multivariate network. The multivariate characteristic makes most existing methods incapable to tackle this problem effectively and efficiently, as it involves joi...
متن کاملOptimal and Fast Detection of Spatial Clusters with Scan Statistics1 by Guenther Walther
We consider the detection of multivariate spatial clusters in the Bernoulli model with N locations, where the design distribution has weakly dependent marginals. The locations are scanned with a rectangular window with sides parallel to the axes and with varying sizes and aspect ratios. Multivariate scan statistics pose a statistical problem due to the multiple testing over many scan windows, a...
متن کاملBayesian Network Scan Statistics for Multivariate Pattern Detection
We review three recently proposed scan statistic methods for multivariate pattern detection. Each method models the relationship between multiple observed and hidden variables using a Bayesian network structure, drawing inferences about the underlying pattern type and the affected subset of the data. We first discuss the multivariate Bayesian scan statistic (MBSS) proposed by Neill and Cooper (...
متن کاملDiscretized Spatio-Temporal Scan Window
The focus of this paper is the discovery of anomalous spatio-temporal windows. We propose a Discretized SpatioTemporal Scan Window approach to address the question of how we can treat Space and Time together without compromising on the properties of each and their impact on each other. In doing so we discover anomalous SpatioTemporal windows, identify at what point in time the window changes, i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJDWM
دوره 9 شماره
صفحات -
تاریخ انتشار 2013